Rough clustering of sequential data

نویسندگان

Pradeep Kumar

P. Radha Krishna

Raju S. Bapi

Supriya Kumar De

چکیده

This paper presents a new indiscernibility-based rough agglomerative hierarchical clustering algorithm for sequential data. In this approach, the indiscernibility relation has been extended to a tolerance relation with the transitivity property being relaxed. Initial clusters are formed using a similarity upper approximation. Subsequent clusters are formed using the concept of constrained-similarity upper approximation wherein a condition of relative similarity is used as a merging criterion. We report results of experimentation on msnbc web navigation dataset that are intrinsically sequential in nature. We have compared the results of the proposed approach with that of the traditional hierarchical clustering algorithm using vector coding of sequences. The results establish the viability of the proposed approach. The rough clusters resulting from the proposed algorithm provide interpretations of different navigation orientations of users present in the sessions without having to fit each object into only one group. Such descriptions can help web miners to identify potential and meaningful groups of users. 2007 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Sequential Rough Parallel Bounded Symmetrical Clustering for Gene Expression Profile Analysis

The study on gene expression profiling of tissues and cells has become a major tool for discovery in medicine. Identification of co-expressed genes and coherent patterns is the central goal in gene expression profiling and the important task in the field of bioinformatics research. Clustering is an important unsupervised learning technique for Gene Expression Profile Analysis. Many conventional...

متن کامل

Neighborhood Clustering of Web Users With Rough K-Means

Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuz...

متن کامل

Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering

The Basel II Accord pointed out benefits of credit risk management through internal models to estimate Probability of Default (PD). Banks use default predictions to estimate the loan applicants’ PD. However, in practice, PD is not useful and banks applied credit scorecards for their decision making process. Also the competitive pressures in lending industry forced banks to use profit scorecards...

متن کامل

Does Fundraising Have Meaningful Sequential Patterns? The Case of Fintech Startups

Nowadays, fundraising is one of the most important issues for both Fintech investors and startups. The pattern of fundraising in terms of “number and type of rounds and stages needed” are important. The diverse features and factors that could stem from Fintech business models which can influence success are of the key issues in shaping these patterns. This study applied the top 100 KPMG Fintech...

متن کامل

Rough Set based Rule Induction Package for R

Rough set theory is a framework of dealing with uncertainty based on computation of equivalence relations/clases. Since a proability is defined as a measure of sample space, defined by equivalence classes, rough sets are closely related with probabilities in the deep level of mathematics. Furthermore, since rough sets are closely related with Demster-Shafer theory or fuzzy sets, this theory can...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Data Knowl. Eng.

دوره 63 شماره

صفحات -

تاریخ انتشار 2007

Rough clustering of sequential data

نویسندگان

چکیده

منابع مشابه

Scalable Sequential Rough Parallel Bounded Symmetrical Clustering for Gene Expression Profile Analysis

Neighborhood Clustering of Web Users With Rough K-Means

Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering

Does Fundraising Have Meaningful Sequential Patterns? The Case of Fintech Startups

Rough Set based Rule Induction Package for R

عنوان ژورنال:

اشتراک گذاری